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1.0 PRIN MPAL CONSIDERATIONS IN EXPERIMENTAL DESIGN AND DATA 
ANALYSIS 

The purpose of this section Is to present the main steps taken in setting 
up an experiment to furnish data on a hypothesis and then analyzing these data 
in order to obtain information leading to acceptance or rejection of the hypo- 
thesis. 

Figure 1 shows the main quantitative factors which affect the result of 
a statistical hypothesis test on the data furnished by an experiment. These are: 

1. Measurement error 

2. Subject-to-subject variation 

3. Day-to-day variation 

4. Sample si/e (number of subjects) 

5. Number of measurements on a subject 

G. NuniLKir of measurements taken over a period of days. 

Figure 1 shows schematically the effect of these factors on th»* outcome 
of the experiment and of the post -experiment analysis. That is. after an 
experiment is performed, a statistical analysis is generally carried out to test 
one or more hv|Jotheses. The results of this analysis are, for each such hypo- 
thesis, (1) a decision to accept or reject the hypothesis, and (2) a numeric 
"confidence" in the correctness of this acceptance or rejection. This confi- 
dence is expressed by two sets of parameters: the significance level of the 
hypothesis, and confidence intervals about the parameters used in the statement 
of the hypothesis. The significance level is the probability of rejecting the 
hypothesis when it is true. 

Tho measurement error is usually normally distributed about a mean, 
which is ideally equal to zero. This means that individual errors of meas- • 

urement vary randomly, sometimes above the mean and sometimes below. 

The average of a sequence of measurements will tend to be closer to the 
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mean, as the number of such measurements increases. Thus, if successive 
measurements on a single subject are statistically independent of each other 
and normally distributed with mean of zero, and if it it possible to make a 
series of such measurements, the probability that the average of all such meis- 
uremenfs has error quite close to zeio is higher than for the case of only one 
such measurement. 

Following is a table of such probabilities for the case of a measuring 
device whose errors are normally distributed with mean of zero and standard 
deviation (denoted by the symbol 9 ) equal to 1: 

< 0 . 1 ) 


1 

0.682G 

0. 0796 

2 

0. 8414 

0. 1114 

3 

0. 9164 

0. 1350 

4 

0. 9544 

0. 1586 

5 

0.9742 

0. 1742 

10 

0. 9984 

0.2510 

30 

>0. 9999 

0.4176 

100 

>0. 9999 

0.6826 

200 

>0. 9999 

0.8414 

400 

>0. 9999 

0. 9544 

1000 

>0. 9999 

>0. 9999 


Here the term error is defined as follows: If e. e„, . .. , e, are 
n 12k 

the errors of the first, second,. . . , kth measurements, respectively, then 

error = e ; error = (e + e )/2 error = (e + e, +... +e )/n. 

11 212 n l 2 n 

The second quantitative factor mentioned above, viz. subject-to- 
subject variation, is the natural variation between subjects of any quantity. 


of Measurements 



For Instance, some subjects may lose more bone than others, or more body 
water than others, under the influence of bed rest or space flight. It seems 
reasonable to assume that this variation is an expression of the variance of a 
normally distributed random variable. For example, the loss of trabecular 
bone sustained by male subjects of a certain age and physical condition after 
three weeks of space flight may average 16Sc with a standard deviation of 3%, 
with the losses normally distributed about the mean • The same type of 
loss for female subjects is likely also normally distributed, quite )x>sslbly with 
a different mean, but with much the same standard deviation. 

More generally, any statistical parameter, such as average total body 
water (TBWi loss or bone loss, may be estimated from any sample of one or 
more subjects. If the numbers X^, . . . ,X^ represent TBW losses or bone 
losses for each of n subjects, the estimate of average TBW or bone loss for 
the population from which the subjects were taken is calculated as 

X- (X + X,, + ... + X )/n. 

1 £ n 


The experimenter may naturally wish to know how close to the true value 
of mean TBW loss, or bone loss he or she has come by making this estimate. 
This question is answered statistically by means of confidence intervals . That 
is, for a given number n of observations: 



the experimenter can calculate intervals about the quantity X which contain 
the true mean value (of the observed quantity) with any known probability. 
Specifically, given n Independent observations X , X , .... X of a normally 

I M n 

distributed random variable X, a 100 (1 — o ) 9f^ confidence interval about 
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That Is, the probability Is 1 - • that the true mean value (denoted by ^ ) for 

X Is contained In this Interval. Here the t , values may be found In any 

n-1 

table of the t distribution. The quantity S Is defined by 


1/2 



An example may make this clearer: If n > 10 and the ten observed values are 

4.8, 5.2, 5.0, 5.5, 4.7, 4.9, 5.4, 5.1, 4.8, 4.G, then 

X » .5.0 and S « 0.298. 


Then If we want the confidence interval for P , we set • » 0. 05 and look up 
t (0.975): this value Is 2.2G2. Thus, we have 


S 




0. 298 
3. 1G2 


(2.2G2) = 0.2132. 


Then fore, we have 95% confidence that the tVue value of the mean p of the ran- 
dom variable X Is In the interval 


(5 - 0.2132.5 + 0.2132), or (4. 78G8, 5. 21.32). 

In other words, the probability that the mean of X Is In this Interval Is 0.95. 

If, however, we had had only five observed values, say 4.8, 5.2, 5.0, 

5.3, 4.7, we would have X= 5. 0 as before, and 8 = 0.255. This time, since 

t (0. 975) = 2. 776, we would get 
4 
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0.255 

2.236 



0.3166, 


and our 959f confidence Interval is now (4.6834, 5.3166). So the 95^7 confi- 
dence interval has widened with decreased number of observations. This is 
true in general of confidence intervals; as the sample size gets smalle*, the 
interval gets wider. Or, if the interval is held constant, the confidence level 
decreases. It is. of course, intuitively clear that this should be so; the greater 
the number of observations we make, the higher should be our confidence that the 
true mean will fall into a given Interval about our estimate, and the narrower 
should be the interval about our estimate for a given confidence level. 

The next quantitative factor is day-to-day variation. Most physio- 
logical quantities are subject to some variation from day to day (examples are 
blood pressure, TBW, etc.) These variations appear to be random, and thus such 
physiological quantities may be treated as random variables in the same way as 
above; confidence intervals may agafn be calculated for the true mean of a quan- 
tity over a period of days, if we can make the assun ption that the variation is 
only statistical and does not indicate a time change in the mean itself. 

Each of the three quantitative factors Just described will have an adverse 
effect on the confidence the experimenter may have in the conclusions he or she 
may draw from analyzing the data obtained by experiment. The greater the factors 
(l.e.,the larger the <r of the corresponding distributions) the more adverse this 
effect will be. 

Conversely, the next three factors to be discussed, viz. sample size 
(numljer of subjects), number of measurements (repetitions), and the number of 
measurements taken each on successive days have a favorable effect on the con- 
fidence the experimenter may have in the conclusions drawn. This effect counter- 
acts the adverse effect of the first three factors, and if the sample size and the 
number of repetitions and daily measurements can be raised high enough, the 
experimenter can achieve any desired level of confidence in these conclusions. 
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Merc the concept of "confidence in conclusions drawn" is denoted on the 
rifthl of Figure 1; it is usually expressed by confidence limits on means, since 
means are usually used in expressing statistical hypotheses. 

To make this clearer, an example will be presented, using TBW loss 
as the subject of a statistical hypothesis. Here there are only two treatments; 
zero-g and 1-ft and hence a t-test is appropriate. For the present case the t 
statistic has the form 



where n is the numijcr of subjects, X is the sample mean of the TBW losses 
for n different subjects: 




and S is the sample standard deviation: 



_ 2 

(X. - X) /(n- 1) 


1/2 


The question the experimenter desires to answer is: Is there a real loss in TBW, 
and if so, how much, and what confidence can I have in these conclusions, given 
my set of data, sample size, etc. ? (Either there js_a real loss, or any apparent 
loss is really only due to random variation in the data. ) This question is trans- 
lated into statistical language as a hypothesis, namely: 


M = ( mean TBW' loss due to zero-g environment) 
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The hypathesiti Is clearly equivalent to stating that there Is loss. 
It Is statistically tested by calculating the t statistic X/(S/nr ) and 


a. 


or 


rejecting the hypothesis If X/(S |/fT) > t (1- a J 

» n-1 

accepting the hypothesis If X/(S,^fT ) < t^_^ (1 - • ). 


As an example, suppose we have TDW loss values for three subjects 
of 0. (j liters, 1. I liters and 0. 5 liters. These data give 


X « 0. 73.'i;i, S - 0. 3215, X-/(S,^ ) - 3. 9511. 

For « * 0. 05, we have t , (1- • ) * t . (0. 95) » 2. 920. Thus we would. In this 

n~i 2 

case, reject the hypothesis at the 0.05 significance level; 1. e. , we would reject 
the assertion that average loss of TRW in the 7,ero-g environment Is zero or 
less. This is equivalent to concluding that there Is a real loss In TDW, Induced 
by zero-g conditions, for the general population from which we drew the subjects 
for the experiment. 

For such a case we also have 100 (1 - o ) % confidence that the true mean 
TRW loss satisfies the following inequality: 





For a = 0.05, this is translated into saying that 
P ^ p > 0. 7333 - (0. 18.56) (2. 920)^ = 


Probability that P > 0. 1913 Is equal to 0.95. Thus, given the three 
measurements 0.6, 1. 1, 0.5, without even knowing the true mean or standard 
deviation of the distribution, we can say with 95% confidence that the true mean 


Is at least 0. 1913. 


f 
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More generally, given a particular true mean for a change In a para- 
meter {e.g. bone loss or TBW loss) Induced by spaceflight, and given partic- 
ular values for the first three values on the left-hand side of the block diagram, 
the experimenter may wish tc - I'.v ;icw many measurements hi-’ must have (1) 
to ensure that the results of a hypothesis test will call the change statistically 
significant (i. e. reject the statistical hypothesis of no change, mentioned 
above), and (2) assure the experimenter of a particular level of confidence 
that the true mean is greater than a given value. The answer to the question 
posed by (1) is given by the set of curves in Figure 2, and the 'irswer to the 
question posed by (2) is given by the curves in Figure 3. 

The abscissa of the curves in Figure 2 is the ratio (i / a , where M is 
the mean of the quantity being measured, and <r is the composite standard 
deviation of this quantity. That is, this value of a is the standard deviation 
for measurements of a quantity pertaining to one subject , measured possibly 
several times each day over a number of days. We define the following quan- 
tities: 

2 

= Subject-to-subject variance of the quantity 
being measured. 


a 


2 

2 


Variance introduced by the measuring device, 
or "reproducibility. " 


Variance introduced by day-to-day variation of a 
measured quantity in the same subject. 


n^ = Number of subjects. 

n = Number of times a measurement is taken on 

one subject in one day. 

n = Number of days on which measurements are 

i5 

taken on one subject. It is assumed that n^ is 
the same for all these days. 




i 



2 


3 
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If the quantities are averaged en’er all measurements on each day, the 
variance due to measurement error is reduced to /n^ . If these quantities 
are averaged over the total number of days on which measur«,‘ments are taken, 
the combined variance due to both measurement and day-to-day variation is 

•j 

s“ 

3 

— . 


2 3 


Thus, the quantity measured for one subject and averaged as described 
above is normally distributed with mean and standard deviation equal to 


a • 



2 



This is the 9 used in the abscissa value o in Figure 2. The values 
lalielcd n^ in the figure represent the number of subjects, Oj. 

The ordinate of the curves in Figure 2 represents the prolKibility that 
the hyiKdhesis test will call a change significant; l.e. , reject the hypothesis 
that fi %0. For example, if average TBW loss has the same mean equal to 1 
(i.e. , ii « 1) over three days and overall n^ subjects (note that this does not 
say that actual THVV loss is the same for all subjects; it simply says that the 
subjects may be considered as belonging to the same statistical population for 
the three days, and that the mean for this population is equal to p ), but 

from the formula for a above, we have 


(0.5) 


2 , (0. 5j. . (0.5) 


72 




(0.5) (1.20) = 0.6. 



13 


Thus, o ■ l,6fi67, and the curves in Figure 2 show that the ^ 0 
hypothesis will be rejected with a probability of about 0. 5.’> if n^ ■ 3, 0. »}5 if 
Uj 5, 0.91 if n^ * 7, and 0.98 if n^ « 9. 

On the other hand, if a is still 0.5, but we take so many measurements 
over so many days that the effect from o , o may he neglected, we shall have 

i3 

a now a|)pruximately equal to 0. 5, so that p /a ■ 2. For this case the probabil- 
ities rise to about 0.62, 0.90, 0.97, and 0.995 for n^ ■ 3, 5, 7, 9, respectively. 

Or, If the a's all decreased to 0.4167 for the first case of three meas- 
urements and three days these latter numbers would again result. 

The following conclusion is evident from the second example*, the 
probability that can be ol)taincd by increasing n„ and n is Ijounded by the value 
of n.. The second exaniple is tantamount to assuming that n , n = <r . The 

A 6 «j 

only way to raise the probabilities higher than these values is to raise n^. On 
the other hand, it is clear that we can achieve as high a probability as we like 
by Increasing n^. 

The curves in Figure 3 represent the confidence that p ^ X- a, where 
X is the sample mean of measurements on n^ subjects and a is some positive 
number. Herr the confidence is entirely independent of the actual value of X; 
the only dep< '•.Jcnce is on the values of a, o and n^, where o is defined as 
above. 

If we suppose that P = 1 and o = o. 6 as in the first example above, and 
that a = 1, then we have that a = 0.6a and thus, the confidence lies between the 
curves o = 0. ,5a and o= a. Therefore, it is about 95%, even for a sample of 
only one subject. That is, the confidence that p ^0 is about 95%. 

On the other hand, if a = 0. 5 and a = 1/2, we get o = a, and thus the 
confidence that p > 1/2 is 84%, 92%, 96%, and 97.5% for n^ = 1, 2, 3, 4, 
respectively. For this same case if a increases to 1.0, then a = 2a. and 
the confidence that p ^ 1/2 vyill be 69%, 76%, 80%, 84%, 87%, ind 89% for 
Oj - 1, 2, 3, 4, 5, and 6, respectively. 

In this way, the confidence values may be determined by using the curves 
for any given values of n , n , n , a , a_, a . 

1 J. X I ^ ti 



14 


-.0 APPLICATION OF MEASl HEMEN'T Ei<HOU ANALYSIS TO SPAC.:- 
FLICHT STUDIES 

2. 1 COMPl'TEit TOMOGitAPHY MEASl'UEMENT EUilOUS 
Background 

In the work by Elsasser*. the author presents results from two types of 
ex|)eriments designed to give an indication of the accuracy that can be expected 
from measurements by computed tomography. The first type of experiment con- 
iistb of measurements on objects designed to simulate actual bone both as to shape 
and as to absorptive properties relative to the radiation used in computed tomo- 
graphy. The materials used arc aluminum and plcxiglas, which arc also used in 
models for the photon absorption method, 'i'he author writes that these materials 
provide a satisfactory approximation of physiological conditions. For modeling 
of trabecular bone and marrow, a mixture of aluminum powder and PMMA cement 
("beracryl®'*> is used. The author also presents considerable detail on the actual 
structure of these objects for modeling bones, but these will not be given here. 

The advantages of carrying out measurements on such a model are, of 
course, that the accuracy of the method may lie tested by comparing the results 
with the known density of the object being measured. 

The results of the tests for these models are given on p. 86 of Elsasscr's 
dissertation, in Tables 5. 16, 5. 17, 5. 18, 5. 19, and 5.20. In Table 5. 16 are 
given results for the following models: 

a) Model to simulate the total bone: Plexiglas, aluminum tubing and PMMA/ 

aluminum evlinder, an illustration of which is given in Figure 5. 7. 

b) Model without "compact bone": The aluminum tubes in a) are replaced by 
plexiglas tubes of the same dimensions. 

c) Model without any "soft tissues": The outer plexiglas cylinder is omitted, 
and only ihe aluminum tubing, with the PMMA/alunnlnum cylinders Inserted, 
Is measured. 

d) "Compact bone" alone: Only the aluminum tubes are measured. 

*Quantlflzierunf, der Spongiosadlchte an Rohrenknechen mlttels computer tomo- 
graphic (Quantification of Trabecular Bone Density in Tubular Bones by Com- 
puted Tomography). 
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e) Models without "spongy bone": Instead of the PMMA/aluminum mixture, 

the interior of the aluminum tubing contains plexiglas cylinders. The thick- 
ness of the aluminum tube wall, i.e. , the "compact bone" thickness, varies 


as follows; 


el) Wall thickness of tubing 

- 1.5 mm. 

e2) 

* 3. 0 mm. 

e3) 

■ 4.0 mm. 

e4) " 

■ 5. 0 mm. 


The numbers given in the table are not densities, but rather "mean linear 
absorption coefficients" of the materials being measured. These coefficients have 
units of cm \ The model configuration a) (line a) in the table) gives trabecular 
hone density in these units for the model of an actual bone. The measured result 
is the value 0.677 - 0. 007 for a "true value" of 0.675 ^ 0. 005. The author notes 
that the ~ 0. 005 is included because the "true" value of the linear absorption 
coefficient for the PMMA/aluminum powder mixture cannot be precisely determined. 

Comparing this measured value with the "true value" shows that for simula- 
tion of the actual bone by the aluminum /plexiglas /PMMA/aluminum powder model, 
the method of computed tomography appears to measure trabecular bone density to 
a very high degree of accuracy; the relative error is only +0.3%. 

.Models b) and c) correspond to trabecular bone without any compact bone 
and without any tissue outside the compact bone, respectively. These are, of 
course, deviations from physiological conditions so severe that they will never 
arise in applications with astronauts as subjects. Nevertheless, even with such 
severe deviations, the relative error is bounded in absolute val ue by the level of 
2. 1%. Models el) through e4) simulate the case where there is no trabecular 
bone; nevertheless, the given results are still labeled "trabecular bone density" 
(Spongiosa-Dlchte) in the dissertation. It is not explicitly stated what trabecular 
bone density means for this case. 
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Table 3. 18 shows the results of measurements to determine "reproduci- 
bility. " This is simply a compilation of results of repeated measurements on the 
same object 10 successive times without changing the position of the plane of meas- 
urement. Thus, an idea of the variability of the measurement is provided, with the 
result that here the estimate, S, of the standard deviation of the measured density 
is 0. of the density. These measurements were all carried out on the 
same day. 

Table 5. 19 shows results of 10 measurements taken at intervals of 10 mm 
along the longitudinal axis of the model. Here the author cites a standard devia- 
tion of 1. 2% (on p. 88), which he attributes to inhomogeneities of the density of the 
PMMA/alumlnum powder mixture. 

Long range reproducibility (over 12 months or more) is given by Table 5.20. 
The author comments that there is no systematic error detectable due to age of 
the radiation source used in the measurements. 

The Increased standard deviation of these measured values is attributed to 
density variations in the model and a decreasing statistical accuracy as a function 
of age of the radiation source (p. 88, , section 5.2.5). 

Before goiig on to general measurements on human subjects, we mention a 
remark of the author on p. 97 to the effect that the observed difference between 
the digital tomographically determined trabecular bone densities of normal and 
osteoporotic femurs are greater by a factor of 10 than the total observed mineral 
content. 

The second type of experiments which were carried out were those on human 
subjects. Here the location of measurement wan on the radius of the right arm at 
a distance of 10*? of the ulna length from the ulna styloid process. The density of 
trabecular bone is defined as the mineral value of all matrix elements of the area 
located in the interior of the radius, equidistant from the outer edge of the bone, 
and comprises 50^ of the lo^al bone cross-sectional area. 
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The first ex[K 2 riment discussed was carried out on 9o subjects, of which 
the majority’ fall in the two categories 5-16 years and 20-40 years of age. The 
author says that the presence of most of the subjects in these two age groups is 
random. 

The results of this experiment are presented in Figure 6.4 and Table 6.5. 
The latter gives numerical estimates for mean and standard deviation of the sub- 
groups: 14 girls, 23 boys ; 13 women, 46 men ; 37 boys and girls, 59 men and 

women, 27 girls and women, 69 boys and men, and, finally, the total of 96 sub- 
jects. The mean trabecular bone density is found to be about 0. 765, with an 
estimated standard deviation of 0. 120. Perhaps the most significant result is 
that there is no apparent difference in the measured results as a function of age. 

The reproducibility experiments in human subjects yield the results of most 
concern to planners of the Space Shuttle experiments. These results are pre- 
sented and discussed in Section 6.3.3 on p. 113 of Elsasser's dissertation. The 
author mentions in a general way that the reproducibility error is, as in the case 
of the nonhuman models, a function of the positioning of the plane of measurement 
along the arm. He cites some examples of measurements made on humans where 
the reproducibility was of the same order of magnitude for both humans and the 
models: i.e. , about 1. 59r. He then mentions that for all subjects one may nc, 
necessarily hope for such a good reproducibility. He cites two prerequisites for 
good reproducibility: (1) a high degree of cooperation on the part of the subject, 
and (2) the trabecular bone density in the area being investigated must not change 
by more than two percent for each one percent change in measuring position along 
the longitudinal axis of the ulna. 

In the experience of the author, the proper measurement location can seldom 
be found with an accuracy of less than 2. 5 mm. This means that the reproduci- 
bility depends not only on the length of the arm, but also on the density gradient 
along the longitudinal axis of the bone. As an example of a sharply changing 
trabecular bone density, he presents Figure 6. 1.4. Here a shift of 3 mm in one 
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direction or another leads to a change in trabecular bone density from '*-6'7 to 
-7? of the value at 109f of ulna length. 

The author sums up by stating that under optimal conditions the trabecular 
bone density has a reproducibility of about - 1.5%, but under less favorable con- 
ditions b%o measurements on the same subject may differ by more than 10%. A 
more precise estimate of reproducibility for a single subject may only be made with 
knowledge of the axial density gradient; this requires several measurements. One 
possible way to improve reproducibility is to take a plaster cast of the arm and to 
locate the measurement plane by use of this cast on each subsequent measurement. 
The only drawback the author sees with this method is in measuring children, pri- 
marily because their arms tend to grow in size between measurements over a pericxl 
of months. Thus, there appear to be no foreseeable problems in taking measure- 
ments in Space Shuttle astronauts. 

In Section 6. 3. 4 (p. 117) the author mentions another problem which tends 
io decrease the accuracy of the measurements; movements of the subject during 
the time in which measurements are being taken. He states that it is impossible 
to eliminate entirely this source of error, and says that the two sources of error; 
positioning and movement result in about 10% of all measurements being termed 
worthless. 
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Application 

It seems clear that the primary source of error in the computed tomography 
measurement method is due to reproducibility, primarily because of uncertainty 
In positioning the plane of measurement along the arm. However, as the author 
notes, the likely magnitude of the reproducibli'ty may be estimated by making 
several measurements 1 mm apart along the ann of the subject. Thus, if the 
accuracy cannot be Improved, at least the experimenter may obtain an Idea of the 
magnitude of the error, for each subject. 

However, measurement errors are commonly assumed to be normally dis- 
tributed about some mean. If this assumption Is made, it implies that any number 
of successive measurements on the same subject may be used to approximate the 
true value of trabecular bone density more accurately than a single measurement. 
Under this assumption, the mean of a number n of such measured values, i.e. , 
the quantity 




. . . + 


X 

—a 


Is distributed with the same mean as the X^, but with standard deviation « / \fn, 
where o is the standard deviation of a single measurement. 

Thus, theoretically the standard deviation of an estimate of trabecular bon*? 
density may be brought arbitrarily close to zero, simply by taking a sufficiently 
large number of successive, mutually independent measurements. 

One problem with this approach is that the estimate thus obtained for the 
mean error may have some bias; i.e. may not be zero, but may lie on one side 
or the other of zero. In the case of a plaster cast (to help in determining the 
location of the measurement plane), a small bias could easily ensue for the meas- 
urement of density, but since it would be essentially the same for preflight and 
postflight, it would vanish for the measurement of the spaceflight-induced change 
in density. If the position for measurement is always selected by only one person. 
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It is conceivable that some bias could be present, but here too, It would tend to 
cancel out If the same person makes the selection for both prefllght and postfllght 
measurements. 

However, If one person selects the position for preflight measurements and 
another for postfllght measurements, and their location biases (If any) reinforce 
each other (e. g. , the preflight person tends to locate the measurement plane one 
mm too close to the wrist and the postfllght (>er8on lends to locate it too close to 
the elbow), then serious systematic errors in measured density change may well 
result, especially for arms with high density gradients. Hence, it appears advis- 
able either to have location done by the same (lerson on both occasions, or to make 
several measurements with the location done by different |)crsons for each meas- 
urement, both before and after. 

Because of this and other considerations (e.g. Elsasser states that the statis- 
tical properties of the radiation source change slightly with time; by this he means 
apparently a progressive Increase of the standard deviation of measurements; 1. e. , 
reproducibility error; his statement of no systematic error means that the mean 
is zero), it seems reasonable to assume that while we can bring the standard devia- 
tion of successive measurements quite close to zero, we probably cannot. In real 
life, bring it arbitrarily close to zero, as was mentioned above for the theoretical 
case. Another practical consideration supporting this reservation is that we only 
have a relatively short time in which to perform postflight measurements, since 
the bone density is expected almost immediately to start increasing back toward 
the 1-g level. 

So to be conservative, we might assume that by taking 9 or 10 preflight 
measurements and 9 or 10 postfllght measurements (both under such conditions 
that we may be sure that density change during measurement is negligible; prob- 
ably all measurements on a subject should be taken on the same day), we could 
reduce the variance of the mean of the measurements by a factor of at least 8 
(Instead of 9 or 10, as In the theoretical case), which means a reduction in the 
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standard deviation of each (prefUght/postflight) density estimate by a factor of 
If 9 represents reproducibility, or standard deviation of the actual 
densities, then the standard deviation of the difference between the densities is 
given by » times VX Thus, this assumptii n implies a reduction in the standard 
deviation of the actual difference between zero-g and 1-g densities by a factor of 
two. 

While this factor ma/ seem to be low, it should be kept in mind that experi- 
ments on actual subjects may show it to be somewhat higher. Also, since good 
estimates of reproducibility error may be obtained by taking computed tomographic 
estimates at several adjacent points on the arm spaced equidistant from the desired 
measuring point, subjects who will have very large reproducibility v's can be 
identified. It seems likely that measurements can be carried out on a sufficiently 
large randomly selected sample of the population from which subjects are chosen 
to determine the distribution of density gradients in the radius over the total popu- 
lation with a high level of confidence. If this distribution then indicated that, say, 
only 2% of the population have gradients implying » of more than 9%, such people 
could l>e excluded from the experiment without strongly impinging on the represen- 
tativeness of the sample finally chosen; 1. e. , It would still represent at least OSOc 
of the population. 

Examples 

An example of trabecular bone density differences between an immobilized 
(for three weeks, due to a fracture) arm and the opposite arm of 14 children is 
given in the reference: Dynamics of Trabecular and Compact Bone Mineral of the 
Radius after Immobilization jf the Upper Arm in Children , by Elsasser, Exner, 
Prader and Anllker. Here there is a rather large sample standard deviation (13% 
of trabecular bone density in the healthy arm) for trabecular bone density loss; 
this leads to a 95% confidence interval of - 6,21 about the sample mean of 17%; 
i. e. , the probability is 0.95 that fat true mean of the population lies somewhere 
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In the interval (10.79, 23.21). Further, the fact (hat comparisons were made 
with the healthy arm of the same subject tends to confuse deviations in bone loss 
of the type which would occur in a zero-g environment with deviations which occur 
as a consequence of the variation In physical activity of the healthy arms of the 
subjects in this experiment. Also, unless care was taken in selection of the sub- 
jects, some of the variation may be due to the possibility that some of the arms 
fractured were dominant, and others nondominant. 

Nonetheless, it seems instructive to see what the results of a t-test would 
l>e for a population having a true mean of 179c and 13% standard deviation. Here, 
if we assume that the reproducibility standard deviation is (after we have reduced 
it as much as we can) 2%, then the a for the curves in Figure 2 is 

ff - [^13^ + 2^] * 13.15, 

and hence, p / a • fhc abscissa of these curves, is about 17/13. 15 % 1.29. The 
curves then show that for such a high standard deviation for the population, the 
probabilities that a t-test will deliver a verdict of p > 0 for the general popula- 
tion are 0.39, 0. G7, 0.84 and 0.94 for number of subjects equal to 3, 5, 7, 9, 
respectively, where the significance level of the test is 0,05 (probability of reject- 
ing the hypothesis p < 0 when it is true). If the reproducibility of the computed 
tomography method should be so bad that its standard deviation is 10%, then the 
a becomes 

• • 

SO that p /ff ft! 1.07, and the probabilities for the t-test are now 0.33, 0.56, 0.77 
and 0. 89 for number of subjects equal to 3, 5, 7, and 9, respectively. 

These results indicate that for such a large a due to subject-to-subject 
variation, even errors introduced by what is close to the worst possible reproduci- 
bility (10% + only one measurement) make relatively little difference in the 
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probabllily of reject ion/accepiancc of the hypothesis that the average loss li. . 
becular bone over the entire population is greater than zero. 

To get an idea how the t-test will behave if the « due to subject-to-subject 
variation is some>\'hat less than in the example above, we present another 
example. 

In the introduction to his dissertation, Elsasser says that total trabecular 
bone loss seems to vary from 10*?. to 41*1 for subjects who are inactive for 3 to 4 
weeks. If these are extremes, l.e. , If 99*? of all subjects suffer losses over this 
period between these limits, and if, further, the losses are normally distributed, 
then typical values for a , and mean loss are 6% and 25*?, respectively, of total 
trabecular bone density. Here if measurement standard deviation is again rather 
high (*10*?) and we can reduce it by a factor of two, then the a value for Figure 2 
becomes 

o’ * (0. Ofi)^ + (0. 05)^ » 0. 0778. 

Since is 25*? we get p / o * 0. 25/0. 0778 ai 3. 22, and Figure 2 shows then that 
the t-test will say n >0 with probabilities 0.89, 0.98, 0.999, 0.999 ~ for numbers 
of subjects = 3, 5, 7, 9, respectively. Again the computed tomographic reproduci- 
bility error plavs a relatively minor role in determining what the test will do. 

Next, consider a single subject with rather high reproducibility * 2 * 

*2 = 0.09, or 9*4 of total density, and suppose we can only reduce this by a factor 
of 2 by repeated measurements, etc. It may be of interest to know, under these 
circumstances, the probability that the measured value is at least equal to 90% of 
p (the actual value which would be returned by computed tomography if there were 
no reproducibility error) or 75%, 50%, 25% of p , or simply the probability that the 
measured value is greater than zero. Curves for these probabilities are given in 
Figure 4. The abscissa here is the value of p ( p =0. 1 means p is 10% of tra- 
becular bone density), the mean bone loss. 
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FIGURE 4 



We see from Finure 4 that if the true loss la SS(, the probability that the 
measured loaa will be at least 57, 4.57, 3.757, 2.57, 1.257, or 0 is 0.500, 
0.520, 0.609, 0.709, 0.800, and 0.870, respectively. Fora 107 true loss, the 
corres|M>nding numbers are 0.5, 0.587, 0.712, 0,807, 0.953, and 0.98. Thus, 
for a single suliject with #2 ” could feel quite confident (95.37 confidence) 

that the measured value would be at least 2.257. 

If on the other hand we have a subject with #2 ■ 27, or one for whom we can 
bring down to 27 by repeated mearureinents, etc. the curves will look as in 
Figure 5. Here an actual loss of 57 '{ives probabilities of measured values at Ic'ist 
57, 4.57 , 3.757 , 2.57, 1.257 and 0 of 0. 500, 0.579, 0.7.14, 0.894, 0.969, and 
0.994, respectively. Thus, the probability that measured value 1 1.257 is here 
0.969, so that a 57 change seems quite reliably detectable. 



FOR O = 0.02 (2 PROBABILITY THAT MEASURED 
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FIGURE 5 


27 


2.2 BASAL METABOLISM MEASUREMENT ErtiiORS 

The followlnK questions have been posed on this suliject: 

In order to detect, at the 95^, confidence level, a 59[, 109c, 20'?, or 30'i 
difference in basal metabolic rate, how many subjects do I need, how many 
re[X3tltive measurements should I make on a single subject, how many times over 
a period of days should I repeat the measurements, and what statistical analysis 
techniques should I use? 

The answer to these questions is given in terms of the parameters n., n , 

X M 

n_; , 9 , all of which were previously defined. First, the question of 

•J 1 A 

detecting the above mentioned changes with confidence will be discussed, 
using some curves presented below, and then the question of analysis will be 
discussed using curves in Figure 2, presented previously. 

For basic metabolic rate, the quantities v , v , c may, of course, have 

J M «5 

different values for each of the parameters which define this rate. We s.iall dis- 
cuss the problem of detecting a change in one of these parameters and mention 
that the statistical features of this problem are identical for any one of the para- 
meters. 

The quantities a , a , a represent standard deviations due to (1) sub- 

1 6 «J 

Ject-to-subJect variation, (2) measurement-to-measurement variation, and 
.'3) day-to-day variation, respectively. Since we are here only Interested in 
changes in a parameter, we may expect to be measuring values applying to post- 
flight conditions (i. e. conditions which obtain at the end of flight, not those for a 
day or two after completion of flight), and those applying to preflight conditions. 
The measurements will presumably be made over a sufficiently short period as 
to preclude any significant variation with time, because time variations will affect 
the measured value of a difference. Hence, the value of a here is zero, and 
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If such parameters have never previously been measured during spaceflight, 
the experimenter has no way of knowing the value of although he may be able 
to obtain a fairly close estimate for this through bed rest studies. If there are 
data available from previous flights, the value of the composite » given by 
contributions from and particular values of n^, n^; l.e. »» 

Ifor the change in a parameter from 1-g to zero-g 

may be estimated by the formula for S given previously. This formula is also 
used, of course, in estimating from bed rest data. 

The user should usually have a falrlv good estimate for 9 from the manu- 
facturer of the measuring instrument, or some other source. 

The procedure for measurements is to take the measurements of all subjects 
under conditions which the experimenter considers to represent satisfactorily the 
1-g environment, and then tuke identical measurements of the same parameters 
immediately after return from spaceflight, while the parameters still are as close 
as possible to the zero-g values. Or, if it is feasible to take measurements daily 
aboard the spacecraft, this may be done. It affords the advantages of permitting 
detection of any appreciable time trends in the parameters under the influence of 
spaceflight. 

Before the actual experiments, if measurements are to be taken as described 
above, for any given values of (to be defined below) and a (already mentioned 
above), certain things may be said about the likely outcome of the experiment, and 
the confidence the experimenter may have in his or her results. For this case, 
a = ^ mean change in a parameter for 

the population from which the subjects were selected, then a particular number 
n of subjects will be required for 95^ confidence that the sample mean of the 
measured values of a parameter: 

X = ( X, f X., + . . . + X )/n 
n 1 2 n 

(where the X , X , . , . are the measured values and the total number of such values 
is n) is at least 90%, 75%, 50%, or 25% of the true mean . It can be shown 
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mathemaUcally that this number n depends only on the ratio p / o . The func- 
tional dependence of n on n / a depicted in Figure 6. Here it is assumed the 
measured value is a normal landom variable with mean zero and standard devia- 
tion standard deviation of the sample mean of n^ measurements 

IS 

The tables from which the curves In Figure 6 were plotted are the following: 

0.9p: 


p/ff 

0.5 

1.0 

1,5 

2.0 

2. 5 

3. 0 

3. 5 

4.0 

8.0 


n 

1089 

273 

121 

68 

44 

31 

22 

17 

4 

T’ 

0. 75P 

p/ff 

0.5 

1.0 

1. 5 

2.0 

2.5 

3. 0 

3. 5 

4,0 



n 

175 

44 

20 

11 

7 

5 

4 

3 



p_/» : 

0.5 

1.0 

1.5 

2.0 

2. 5 

3. 5 





n 

44 

11 

5 

3 

2 

1 





0.25 P 

-■ 










M la 

0. 5 

1.0 

1.55 

2.2 







n 

20 

5 

2 

1 







£l 












0.499 

0. 522 

0.55 

0. 58 

0.623 

0. 67 

0. 74 

0.83 

0. 96 

1. 17 

n 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 


It is thus seen that for a case where p =o (1. e. a rather large standard 

deviation a ) we have 95% confidence that If five subjects are chosen (n - 5), then 

will be at least equal to 0. 25p. where p is the true population change for the 
5 

parameter we are estimating, and that even with only three subjects we still may 
have about 95% confidence that X will at least not be negative. 


1.65 

1 
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As for "trade-off' between repeated measurements and number of subjects, 
it is clear from the form of the tables and the curves that making the number of 
subjects iarge enough will enable us to detect any change, no matter how small 
M /a may be. For example, even if n„ (number of measurements) is only 1, and 

A 

if is quite large, say - 3, and is 4, so that 



and if p is only 2. 5, so that p/a =0. 5, then we can still achieve a 95% confidence 

level that the measured X > 0. 9 p by increasing n to 1089 subjects. However, 

n 

if we hold Oy and n constant at 3 and increase the number n^ of measurements 
to 1000, 2000, or 5000, we see by the formula 



that our a will still be at least 3. Thus, p /a is no more than 0.8333. , and 

thus the lowest curve applies; we may then only say that X will be greater than or 
at least equal to zero. 

As for statistical analysis techniques, the data from an experiment are often 
used to test the hypothesis that the true mean change in a parameter for a population 
(e. g. the population of all healthy subjects between 28 and 38 years of age) is greater 
than zero or < 0. This is done by a "t-test," which is explained in the text accom- 
panying Figure 2. This figure prc3ents the number of subjects necessary to ensure 
a particular probability that the hypothesis of no change, or of a change in opposite 
direction to that of the true change, will be rejected. 

As an example of the use of these curves, suppose we have p /a = 0. 5, as 
mentioned above. Here it is clear from the curves in Figure 2 that many more than 
9 subjects would be necessary to ensure a probability of 0. 95 that a hypothesis con- 
trary to the real change would be rejected. If, however, p / o =2, then only about 
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six subjects would be necessary for this, and with p /a ■ 4. 3, then only three 
subjects would be nece^^tsary. Again, we emphasize that such estimates for M 
and a might be obtained from bed rest studies or, if available, data for changes 
in the parameters of interest from past space flights. 
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1. 3 BODY WATER MEASUREMENT ERRORS 
Problem- 

A new method* Is proposed to measure total body water Inflight based on 
ethanol dilution and non-lnvaslve breath analysis. Assuming the changes In total 
body water during Shuttle Spacelab missions are similar to those observed In the 
nine Skylab crewmembers (see Figures 7 and 8), will this method provide the 
precision necessary to detect the expected water losses? 

These data (Figure 7 and 8) serve to illuminate the problem of estimating 
what can be expected from future experiments to measure this parameter. We 
summarize these data below, and then apply them to the problem of estimating 
sample size and making other considerations lor future experiments. 

Data for TBW loss for a single crewman; 

Here the mean change seems to be about 0. 9 liter. Since 95% of the day- 
to-day measurements seem to fall within - 0. 5 liter of this postulated mean, it 
might be reasonable to assume that 2 a a: 0.5 liter, or a. %0. 25 liter, since 
this Is the case if the day-to-day variation is normally distributed about the mean. 

Data for TBW losses for the entire Skylab cre w: 

It looks here as If the mean TBW loss is equal to 1.4 liters, and as if a is 
around 0. 3 (since the vertical line for two days after launch is about 0. 6 liters in 
length). 

Since these values are radically different from the case of a single crewman. 
It appears possible that TBW loss might vary quite strongly with a person's normal 
(1-g) TBW level; 1. e. , change in TBW level might be quite strongly correlated with 
the 1-g TBW level. Thus, it might be advisable to use percentage of the 1-g TBW 
level as the parameter of Interest for statistical analysis, rather than absolute 

* 

Loeppky, et al (1977) Appl. Physiol. Respirat. Environ. Exercise Physiol. 42; 

803-808. 
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change in liters. If there is such a correlation, the measurement results might 
be appreciably changed, since the numbers in Table 1 of the article by Loeppky, 
et al (Total Body Water and Lean Body Mass Estimated by Ethanol Dilution) show 
a TBW range of 36.2 liters to 67.2 liters for a sample of 35 human subjects. 

However, even the measurements In Figure H appear to show a striking 
uniformity in the individual TBW change, as indicated by the apparent standard 
deviation of 0.3 liter. If we suppose that these data yield sample standard devia- 
tion S > 0. 3, then a 95% confidence interval for p , the actual mean of the popu- 
lation, is 


[l. 1694, 1.6306 ] . 

Furthermore, with these values a t-test at the 0.5 level would reject the 
hypothesis of change contrary to the true change, since 



L4 

0.3/3 


14.0, 


which is greater than 1. 860 * t (0. 95). 

O 

A study by Culebrat, et al (A Comparative Study of TBW as Measured bv 
Isotooc Dilution and Body Desslcation in the Rat. Fcdcrcition Pro<>. 35(3):450, 
1976) reports TBW/wt = 0. 702 by desslcation and 0. 714 by HTO. From Table 1 
of the article by Loeppky, et al, the ETH measured value is 0. 717 and that for 
HTO Is 0. 735, for a sample of 35 subjects. Therefore, there appears to be no 
significant difference in accuracy between the ETH and HTO methods, since the 
HTO methods deviates positively from desslcation by about l,68%,and the ETH 
method deviates negatively from the HTO method by about 2.45%, or from dessl- 
cation negatively by about 0.77%, so that the ETH method may be slightly more 
accurate, if we regard desslcation as a sort of absolute norm. 
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For purposes of predicting how future experiments might turn out in using 
the ETM method (assuming that the parameter values p * 1.4. a ■ 0.3 mentioned 
above are valid), we may apply first the tables used In generating Figure 6 . Here 
the graph of ch”.nges in TBW of the Skylab crew of 9 indicates that the a value 
mentioned above incorporates both subject-to-subject and day-to-day variation, 
so that n o • 1.4/0. 3 ■ 4.67. If to be conservative we drop this to 4.00, we 
see that we need 17 subjects to attain 959f confidence that the measured estimate 

will i)e at least 90% of the true value of |i . To attain this level of confidence 
that the estimate will be 0. 75 p , we need only three subjects. For 0. 5 P , 

0. 25p , or 0 we need only one subject. 

In other words, given a population TBW loss of 1.4 liters, and a due to 
subJect-to-subJect variation and day-to-day variation of a little over 0. 3 liters, 
the question, what loss could I detect, and how many sul)jects do I need to do it"? 
is answered by saying: "I would be able, with 95% confidence, to estimate a loss 
uf at least 0.9 (1.4) » 1.26 liters with 17 subjects, at least 1.05 liters with three 
subjects, and 0.70 liters with only one subject." 

Now let us postulate a less favorable scenario; namely, all parameters the 
same as before, but P = 0. 7 liters, rather than 1,4 liters, as previously. With 
this assumption, p a will be a bit higher than 2. 0. If we conservatively assume 
p , <7 = 2.0, the tables will now give 68 , 11, 3, 2, 1 subjects needed in order to 

assort with 95% confidence that we shall obtain at least 0.63, 0.53, 0.35, 0. 18, 
and 0 for the estimated population TBW losses, respectively. 

Turning to the problem of estimating the outcome of the t-test, let us assume 
again the former case where the parameters are p » i, 4 and a = 0. 3. If we 
again sup|X)se that p 'a 4. 0, the curves in Figure 2 give a probability of about 
0.93 that the t-test will reject the hypothesis that P < 0. For 5, 7, or 9 subjects, 
this probability rises above 0.99. 
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With the second assumption made above, viz. that p is only 0.7 liters, but 
the other parameters t*re unchanged, so that p a may l)e taken to be 2.0, the 
curves give probability around 0.66 that the t-test will reject the hypothesis that 
< 0. For 5, 7, or 9 subjects, this probability rises to about 0.92, 0,98, and 
0.99, respectively. 
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3.0 CONCM'FION 


In conclusion, (lie foret{oing work represents a detailed discussion, first of 
the )j;cneral aspects of experimental design as applicable to the Space Shuttle 
exiierimcnts, and then of statistical aspects particular to each of several of the 
proposed biomedical experiments for Space Shuttle. In retrospect, one impression 
seems to stand out somewhat more than any other. This is that, in the literature 
which was furnlshc*d to aid in statistical analysis, there was only one set of infor- 
mation which described results from previous actual flights in enough detail to 
form an Idea of the statistical liehavlor which might be expected for spaceflight — 
induced changes in the parameters of interest. This is the set of information per- 
taining to TBW levels. As a result, the statistical commentary for all the other 
experiments had to assume a rather general character , largely in the form of 
curves and other information which should enable the experimenter to predict 
results for planned experiments only If he or she supplies statistical parameter 
values Ixised either on previous spaceflight data or bed rest studies, or on the 
experimenter's subjective judgement. 

For the TBW exjMjrlments, on the other hand, the information supplied fier- 
mltted some fairly (leflnite predictions to Ijc made about what the results might be 
for future experiments. Here too, of course, the information presented for the 
other experiments can still be used to evaluate a variety of hypothetical scenarios 
which the experlmert er may find useful In making predictions or decisions. It is 
hoped and suggested that past available data will Ikj closely Investigated (if this 
has not already fatten done) for any possible application in future experimental 
designs. 



